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Abstract 

We establish the large deviations asymptotic performance (error exponent) of consensus+innovations 
distributed detection over random networks with generic (non-Gaussian) sensor observations. At each 
time instant, sensors 1) combine theirs with the decision variables of their neighbors (consensus) and 
2) assimilate their new observations (innovations). This paper shows for general non-Gaussian distribu- 
tions that consensus+innovations distributed detection exhibits a phase transition behavior with respect to 
the network degree of connectivity. Above a threshold, distributed is as good as centrahzed, with the same 
optimal asymptotic detection performance, but, below the threshold, distributed detection is suboptimal 
with respect to centralized detection. We determine this threshold and quantify the performance loss 
below threshold. Finally, we show the dependence of the threshold and performance on the distribution 
of the observations: distributed detectors over the same random network, but with different observations' 
distributions, for example, Gaussian, Laplace, or quantized, may have different asymptotic performance, 
even when the corresponding centralized detectors have the same asymptotic performance. 

•Keywords: Consensus+innovations, performance analysis, Chernoff information, non-Gaussian distributions, distributed 
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I. Introduction 

Consider a distributed detection scenario where N sensors are connected by a generic network with 
intermittently faiUng links. The sensors perform consensus+innovations distributed detection; in other 
words, at each time k, each sensor i updates its local decision variable Xi{k) by: 1) sensing and processing 
a new measurement to create an intermediate variable; and 2) weight averaging it with its neighbors' 
intermediate decision variables. We showed in [1] that, when the sensor observations are Gaussian, the 
consensus+iimovations distributed detector exhibits a phase transition. When the network connectivity is 
above a threshold, then the distributed detector is asymptotically optimal, i.e., asymptotically equivalent 
to the optimal centralized detector that collects the observations of all sensors. 

This paper establishes the asymptotic performance of distributed detection over random networks for 
generic, non-Gaussian sensor observations. We adopt as asymptotic performance measure the exponential 
decay rate of the Bayes error probability (error exponent). We show that phase transition behavior emerges 
with non-Gaussian observations and demonstrate how the optimality threshold is a function of the log- 
moment generating function of the sensors' observations and of the number of sensors N. This reveals a 
very interesting interplay between the distribution of the sensor observations (e.g., Gaussian or Laplace) 
and the rate of diffusion (or connectivity) of the network (measured by a parameter |logr| G [0, oo) 
defined in Section 11): for a network with the same coimectivity, a distributed detector with say, Laplace 
observations distributions, may match the optimal asymptotic performance of the centraUzed detector, 
while the distributed detector for Gaussian observations may be suboptimal, even though the centralized 
detectors for the two distributions, Laplace and Gaussian, have the same optimal asymptotic performance. 

For distributed detection, we determine the range on the detection threshold 7 for which each sensor 
achieves exponentially fast decay of the error probability (strictly positive error exponent), and we find the 
optimal 7 that maximizes the error exponent. Interestingly, above the critical (phase transition) value for 
the network coimectivity | logr|, the optimal detector threshold is 7 = 0, mimicking the (asymptotically) 
optimal threshold for the centralized detector. However, below the critical connectivity, we show by a 
numerical example that the optimal distributed detector threshold might be non zero. 
Brief review of the literature. Distributed detection has been extensively studied, in the context of 
parallel fusion architectures, e.g., [2], [3], [4], [5], [6], [7], [8], consensus-based detection [9], [10], [11], 
[12], and, more recently, consensus-i-innovations distributed inference, see, e.g., [13], [14], [15], [16], [17] 
for distributed estimation, and [18], [19], [20], [21], [22], [23], [24] for distributed detection. Different 
variants of consensus-i-innovations distributed detection algorithms have been proposed; we analyze here 
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running consensus, the variant in [20]. 

Reference [20] considers asymptotic optimality of running consensus, but in a framework that is very 
different from ours. Reference [20] studies the asymptotic performance of the distributed detector where 
the means of the sensor observations under the two hypothesis become closer and closer (vanishing signal 
to noise ratio (SNR)), at the rate of where k is the number of observations. For this problem, 

there is an asymptotic, non-zero, probabiUty of miss and an asymptotic, non-zero, probability of false 
alarm. Under these conditions, running consensus is as efficient as the optimal centraUzed detector, [25], 
as long as the network is connected on average. Here, we assume that the means of the distributions 
stay fixed as k grows. We establish, through large deviations, the rate (error exponent) at which the error 
probability decays to zero as k goes to infinity. We show that connectedness on average is not sufficient 
for running consensus to achieve the optimality of centralized detection; rather, phase change occurs, 
with distributed becoming as good as centralized, when the network connectivity, measured by | logr|, 
exceeds a certain threshold. 

We distinguish this paper from our prior work on the performance analysis of running consensus. 
In [26], we studied deterministically time varying networks and Gaussian observations, and in [27], we 
considered a different consensus+innovations detector with Gaussian observations and additive commu- 
nication noise. Here, we consider random networks, non-Gaussian observations, and noiseless commu- 
nications. Reference [1] considers random networks and Gaussian, spatially correlated observations. In 
contrast, here the observations are non-Gaussian spatially independent. We proved our results in [1] by 
using the quadratic nature of the Gaussian log-moment generating function. For general non-Gaussian 
observations, the log-moment generating function is no longer quadratic, and the arguments in [1] no 
longer apply; we develop a more general methodology that establishes the optimality threshold in terms 
of the log-moment generating function of the log-likelihood ratio. We derive our results from generic 
properties of the log-moment generating function like convexity and zero value at the origin. Finally, 
while reference [1] and our other prior work considered zero detection threshold 7 = 0, here we extend 
the results for generic detection thresholds 7. Our analysis reveals that, when | logr| is above its critical 
value, the zero detector threshold 7 = is (asymptotically) optimal. When | logr| is below the critical 
value, we compute the best detector threshold 7 = 7*, which may be non-zero in general. 

Our analysis shows the impact of the distribution of the sensor observations on the performance of 
distributed detection: distributed detectors (with different distributions of the sensors observations) can 
have different asymptotic performance, even though the corresponding centralized detectors are equivalent, 
as we will illustrate in detail in Section IV. 



April 17, 2012 



DRAFT 



4 



Paper outline. Section n introduces the network and sensor observations models and presents the 
consensus+innovations distributed detector. Section in presents and proves our main results on the 
asymptotic performance of the distributed detector. For a cleaner exposition, this section proves the results 

for (spatially) identically distributed sensor observations. Section IV illustrates our results on several 
types of sensor observation distributions, namely, Gaussian, Laplace, and discrete valued distributions, 
discussing the impact of these distributions on distributed detection performance. Section V extends our 
main results to non-identically distributed sensors' observations. Finally, Section VI concludes the paper. 
Notation. We denote by: Aij the {i,j)-th entry of a matrix A; ai the i-th entry of a vector a; /, 1, 
and Cj, respectively, the identity matrix, the column vector with unit entries, and the i-th column of /; 
J the N X N ideal consensus matrix J := {1/N)11^; \\ ■ \\i the vector (respectively, matrix) Z-norm of 
its vector (respectively, matrix) argument; || ■ || = || • II2 the Euclidean (respectively, spectral) norm of 
its vector (respectively, matrix) argument; ^j(-) the i-th largest eigenvalue; E [•] and P(-) the expected 
value and probability operators, respectively; X4 the indicator function of the event A; the product 
measure of N i.i.d. observations drawn from the distribution with measure v; h'{z) and h"{z) the first 
and the second derivatives of the function h at point z. 



This section introduces the sensor observations model, reviews the optimal centralized detector, and 
presents the consensus+innovations distributed detector. The section also reviews relevant properties of 
the log-moment generating function of a sensor's log-likelihood ratio that are needed in the sequel. 

A. Sensor observations model 

We study the binary hypothesis testing problem Hi versus Hq. We consider a network of A'^ nodes 
where Yiit) is the observation of sensor i at time t, where i = l,...,Ar, t = l,2,... 

Assumption 1 The sensors' observations {Yi{t)} are independent and identically distributed (i.i.d.) both 
in time and in space, with distribution ui under hypothesis Hi and vq under Hq: 



The distributions vi and vo are mutually absolutely continuous, distinguishable measures. The prior 
probabilities tti = P(i^i) and ttq = P(i^o) = 1 - vri are in (0, 1). 

By spatial independence, the joint distribution of the observations of all sensors 



II. Problem FORMULATION 




ui, Hi 




(1) 



Y{t) := {Yi{t),...,YN{t)) 



T 



(2) 
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at any time t is under Hi and under Hq. Our main results in Section in are derived under 
Assumption 1. Section V extends them to non-identical (but still independent) sensors' observations. 

B. Centralized detection, log-moment generating function (LMGF), and optimal error exponent 
The log-likelihood ratio of sensor i at time t is Li{t) and given by 

where, fi{-), I = 0, 1, is 1) the probability density function corresponding to vi, when Yi{t) is an 
absolutely continuous random variable; or 2) the probability mass function corresponding to vi, when 
Yi{t) is discrete valued. 

Under Assumption 1, the log-likelihood ratio test for k time observations from all sensors, for a 
threshold 7 is: ^ 

k N 

Log-moment generating function (LMGF). We introduce the LMGF of Li(t) and its properties that 
play a major role in assessing the performance of distributed detection. 

Let A; (Z = 0, 1) denote the LMGF for the log-UkeUhood ratio under hypothesis Hf. 



Ai-.m^ (-00, +00] , Ai{X) = logE 



(4) 



In (4), Li{l) replaces Li{t), for arbitrary i = 1, ...,N, and t = 1,2, due to the spatial and temporal 
identically distributed observations, see Assumption 1. 

Lemma 1 Consider Assumption 1 . For Aq and Ai in (4) the following holds: 

(a) Aq is convex; 

(b) Ao(A) G (-00,0), for A G (0, 1), Ao(0) = Ao(l) = 0, and A';(0) = E [Li{l)\Hi], / = 0, 1; 

(c) Ai(A) satisfies: 

Ai(A) = Ao(A + 1), for A G M. (5) 

Proof: For a proof of (a) and (b), see [28]. Part (c) follows from the definitions of Aq and Ai, 
which we show here for the case when the distributions ui and f are absolutely continuous (the proof 

'in (3), we re-scale the spatio-temporal sum of the log-likeUhood ratios Li{t) by dividing the sum by Nk. Note that we can 
do so without loss of generality, as the alternative test without re-scaling is: Ylt=i Y^7=i ^ 7'j with 7' = Nk'y. 

Ho 
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for discrete distributions is similar): 



Ai(A) = logE = log / (^^] h{y)dy 

L -I Jy&K. \JO{y)J 

= log / (^) fo{y)dy = Ao(l + A). 

We further assume that the LMGF of a sensor's observation is finite. " 
Assumption 2 Aq{X) < +00, VA G M. 

In the next two remarks, we give two classes of problems when Assumption 2 holds. 
Remark I. We consider the signal+noise model: 

^ ' (6) 
ni{k), Hq. 

Here m 7^ is a constant signal and ni{k) is a zero-mean additive noise with density function /n( ) 
supported on M; we rewrite fni,-), without loss of generality, as /n(y) = ce~^^'^\ where c > is 
a constant. Then, the Appendix shows that Assumption 2 holds under the following mild technical 
condition: either one of (7) or (8) and one of (9) or (10) hold: 

lim = p+, for some p+,r+ G (0, +00) (7) 

n ^i^mAt+ = p+, forsome p+ G (0,+(X)), /x+ G (l,+oo) (8) 

(log (1 7/1))'*+ 

lim ^^^^ = for some p_,r_ G (—00,0) (9) 

y^-00 \yy- 

9{y) 

y^-^oo (log(|y|)) 



lim - — = forsome p_ G (0, — 00), /i_ G (1, +00). (10) 



In (8) and (10), we can also allow either (or both) to equal 1, but then the corresponding p is 

in (1,00). Note that /n(-) need not be symmetric, i.e., /„(y) need not be equal to fn{—y)- Intuitively, 
the tail of the density /n( ) behaves regularly, and g{y) grows either like a polynomial of arbitrary 
finite order in y, or slower, like a power y^, r G (0,1), or like a logarithm c(logy)'*. The class of 
admissible densities /„(•) includes, e.g., power laws cy~'P, p > 1, or the exponential families e^'^'^2/)-^(^')^ 
A{e) := log /^t=-oo e^'''^^h{dy), with: 1) the Lebesgue base measure x; 2) the polynomial, power, or 
logarithmic potentials 0( ); and 3) the canonical set of parameters 9 e @ = {6 : A{9) < +00}, [29]. 
Remark II. Assumption 2 is satisfied if Yi{k) has arbitrary (different) distributions under Hi and Ho 
with the same, compact support; a special case is when Yi{k) is discrete, supported on a finite alphabet. 
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Centralized detection: Asymptotic performance. We consider briefly the performance of the centraUzed 
detector that will benchmark the performance of the distributed detector. Denote by 7; := E [Li(l)|i7;], 
Z = 0, 1. It can be shown [30] that 70 < and 71 > 0. Now, consider the centralized detector in (3) with 
constant thresholds 7, for all k, and denote by: 

a{k, 7) = P {D{k) > -f\Ho) , P{k, 7) = P {D{k) < j\Hi) , : P,{k, 7) = a{k, 7)7ro + P{k, 7)7ri, (11) 

respectively, the probability of false alarm, probability of miss, and Bayes (average) error probability. In 
this paper, we adopt the minimum Bayes error probability criterion, both for the centrahzed and later 
for our distributed detector, and, from now on, we refer to it simply as the error probability. A standard 
Theorem (Theorem 3.4.3., [30]) says that, for any choice of 7 G (70,71), the error probability decays 
exponentially fast to zero in k. For 7 ^ (70,71), the error probabihty does not converge to zero at all. 
To see this, assume that Hi is true, and let 7 > 71. Then, by noting that E[Z)(A;)|i7i] = 71, for all k, we 
have that (3{k, 7) = ¥{D{k) < j\Hi) > F{D{k) < ji\Hi) ^ ^ as A; ^ 00, by the central limit theorem. 
Denote by /;(•), ^ = 0, 1, the Fenchel-Legendre transform [30] of A;(-): 

Ii{z) = sup Xz- Ai{X), z eR. (12) 

AeR 

It can be shown [30] that /;(•) is nonnegative, strictly convex, 7/(7;) = 0, for / = 0, 1, and Ii{z) = 
Io{z) — z, [30]. We now state the result on the centralized detector's asymptotic performance. 

Lemma 2 Let Assumption 1 hold, and consider the family of centrahzed detectors (3) with constant 
threshold 7 = 7 € (70,71). Then, the best (maximal) error exponent: 

lim -ylogPe(fc,7) 

A;->-oo K 

is achieved for the zero threshold 7 = and equals ATCind, where Cind = /o(0)- 

The quantity Cind is referred to as the Chemoff information of a single sensor observation Yi{t). Lemma 
2 says that the centralized detector' error exponent is N times larger than an individual sensor's error 
exponent. We remark that, even if we allow for time-varying thresholds 7^ = 7, the error exponent NCi^d 
cannot be improved, i.e., the centralized detector with zero threshold is asymptotically optimal over all 
detectors. We will see that, when a certain condition on the network connectivity holds, the distributed 
detector is asymptotically optimal, i.e., achieves the best error exponent NC-md, and the zero threshold 
is again optimal. However, when the network connectivity condition is not met, the distributed detector 
is no longer asymptotically optimal, and the optimal threshold may be non zero. 

Proof of Lemma 2: Denote by Ao,Ar the LMGF for the log-likelihood ratio X]ili^i(0 for the 
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observations of all sensors at time t. Then, Ao,Ar(A) = iVAo(A), by the i.i.d. in space assumption on the 
sensors' observations. The Lemma now follows by the Chemoff lemma (Corollary 3.4.6, [30]): 

lim -y logPe(k,0) = max {-Aq nM} = N max {-Ao(A)} = Ar/o(0). 

fe-)-oo A; Ae[o,i] ' Ae[o,i] 

C. Distributed detection algorithm * 

We now consider distributed detection when the sensors cooperate through a randomly varying network. 
Specifically, we consider the running consensus distributed detector proposed in [20]. Each node i 
maintains its local decision variable Xi{k), which is a local estimate of the global optimal decision 
variable D{k) in (3). Note that D{k) is not locally available. At each time k, each sensor i updates Xi{k) 
in two ways: 1) by incorporating its new observation Yi{k) to make an intermediate decision variable 
^^^Xi{k— 1) + ^Li{k); and 2) by exchanging the intermediate decision variable locally with its neighbors 
and computing the weighted average of its own and the neighbors' intermediate variables. 

More precisely, the update of Xi{k) is as follows: 

Xiik)= Wijik) ( '^xj{k - 1) + ^Ljik)Y k = 1,2,... Xi{0) = 0. (13) 

Here Oi(k) is the (random) neighborhood of sensor % at time k (including i), and Wijik) are the (random) 
averaging weights. The sensor i's local decision test at time k is: 

Xi{k) t 7, (14) 

Bo 

i.e., H\ (respectively, Rq) is decided when Xiik) > 7 (respectively, Xi{k) < 7). 

Write the consensus+innovations algorithm (13) in vector form. Let x{k) = {xi{k),X2{k), ...,XN{k))~^ 
and L{k) = (Li(fe), Ljv(A;))^. Also, collect the averaging weights Wij{k) in the AT x TV matrix W{k), 
where, clearly, Wij{k) = if the sensors i and j do not communicate at time step k. The algorithm (13) 
becomes: 

x{k) = Wik) (^^x{k - 1) + jL{k)) ,k = l,2, ... Xi{0) = 0. (15) 



^ k ' ' k 

Network model. We state the assumption on the random averaging matrices W{k). 

Assumptions 3 The averaging matrices W{k) satisfy the following: 

(a) The sequence {W{k)}'^^^ is i.i.d. 

(b) W{k) is symmetric and stochastic (row-sums equal 1 and Wij{k) > 0) with probability one, \/k. 

(c) There exists > 0, such that, for any reaUzation W{k), Wu{k) > ry, Vi, and, Wij{k) > rj whenever 
Wij{k)>0,i^j. 
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(d) W{k) and Y{t) are mutually independent over all k and t. 

Condition (c) is mild and says that: 1) sensor i assigns a non-negligible weight to itself; and 2) when 
sensor i receives a message from sensor j, sensor i assigns a non-negligible weight to sensor j. 
Define the matrices ^{k,t) by: 

^{k, t) := W{k)W{k - l)...W{t), k>t>l. (16) 

It is easy to verify from (15) that x{k) equals: 

k 

x{k) = IY1 ^(^' *)^(*)' k = l,2,... (17) 

t=i 

Choice of threshold 7. We restrict the choice of threshold 7 to 7 G (70, 71), 70 < 0, 71 > 0, where we 
recall ji = K[Li{l)\Hi], I = 0,1. Namely, W{t) is a stochastic matrix, hence W{t)l = 1, for all t, and 
thus ^{k,t)l = 1. Also, K[L{t)\Hi\ = 7^!, for all t, I = 0, 1. Now, by iterating expectation: 

E[x{k)\Hi]=E[E[xik)\Hi,W{l),...,Wik)]]=E 



k 

lY.^k,t)E[L{t)\Hi] 



k 
t=i 



111, 1 = 0,1, 



and ¥,[xi{k)\Hi] = 7;, for all i, k. Moreover, it can be shown (proof is omitted due to lack of space) that 
Xi{k) converges in probability to 7^ under Hi. Now, a similar argument as with the centralized detector 
in II-B shows that for 7 ^ (70,71), the error probability does not converge to zero. We will show that, 
for any 7 G (70,71), the error probability converges to exponentially fast, and we find the optimal 
7 = 7* that maximizes a certain lower bound on the exponent of the error probabihty. 
Network connectivity. From (17), we can see that the matrices ^{k, t) should be as close to J as possible 
for enhanced detection performance. Namely, the ideal (unrealistic) case when $(fc, t) = J for all k, t, 
corresponds to the scenario where each sensor i is equivalent to the optimal centralized detector. It is 
well known that, under certain conditions, the matrices ^{k,t) converge in probability to J: 

P (||$(A;, t) - J\\ > e) ^ as {k - t) ^ 00, e > 0, 

such that P( 11$ (A;, t) - J|| > e) vanishes exponentially fast in (/s-i), i.e., P (||$(/c, t) - J\\ > e) f» r^''-^\ 
r G [0, 1]. The quantity r determines the speed of convergence of the matrices ^{k,t). The closer to 
zero r is, the faster consensus is. We refer to | logr| as the network connectivity. We will see that the 
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distributed detection performance significantly depends on r. Formally, | logr| = — logr is given by:^ 

|logr|:= lim --^ logP (||$(fc, i) - J|| > e) . (18) 

{k-t)-&oo k — t 

For the exact calculation of r, we refer to [31]. Reference [31] shows that, for the commonly used 

models of W{k), gossip and link failure (links in the underlying network fail independently, with possibly 

mutually different probabilities), r is easily computable, by solving a certain min-cut problem. In general, 

r is not easily computable, but all our results (Theorem 5, Corollary 6, Corollary 11) hold when r is 

replaced by an upper bound. An upper bound on r is given by H2 (lE [VF^(A;)]), [31]. 
The following Lemma easily follows from (18). 

Lemma 4 Let Assumption 3 hold. Then, for any S > 0, there exists a constant C{S) G (0, oo) (independent 
of e e (0, 1)) such that: 

P(||$(fc,t) - J\\ > e) < C((5)e-M(|i°sr|-5)^ for all k>t. 
in. Main results: Asymptotic analysis and error exponents for distributed 

DETECTION 

Subsection 111-A states our main results on the asymptotic performance of consensus+innovations 
distributed detection; subsection III-B proves these results. 

A. Statement of main results 

In this section, we analyze the performance of distributed detection in terms of the detection error 
exponent, when the number of observations (per sensor), or the size k of the observation interval tends 
to +00. As we will see next, we show that there exists a threshold on the network coimectivity | logr| 
such that if | logr| is above this threshold, each node in the network achieves asymptotic optimality (i.e., 
the error exponent at each node is the total Chemoff information equal to ATCind)- When | logr| is below 
the threshold, we give a lower bound for the error exponent. Both the threshold and the lower bound are 
given solely in terms of the log-moment generating function Aq and the number of sensors N. These 
findings are summarized in Theorem 5 and Corollary 6 below. 

Let ai{k,j), (3i{k,j), and Pe,i{k,j) denote the probability of false alarm, the probability of miss, and 
the error probabihty, respectively, of sensor i for the detector (13) and (14), for the threshold equal to 7: 

ai{k, 7) = P (xiik) > j\Ho) , (3iik, 7) = P (xiik) < -f\Hi) , P^^k, 7) = noai{k; 7) + 7riA(fc; 7), (19) 
^It can be shown that the limit in (18) exists and that it does not depend on e. 
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where, we recall, tti and ttq are the prior probabilities. 

Theorem 5 Let Assumptions 1-3 hold and consider the family of distributed detectors in (13) and (14) 
with 7 G (70, 7i)- Let A; be the zero of the function: 

A^A) := Ki{N\) - I logrl - Nki{\), / = 0, 1, (20) 

and define 7^ , 7;^, Z = 0, 1 by 

7o- = A()(A^), 7o+ = A()(iVA^) > 70" (21) 
7r = Ai(^AD, 7i+ = A'i(AD > 7r- (22) 
Then, for every 7 G (70, 7i), at each sensor i, i = 1, . . . , AT, we have: 

liminf-ilogai(fc,7)>5o(7), liminf-^ log A(fe,7) > ^1(7), (23) 

where 

{^-^0(7), 7 e (70,7^] 

Ar/o(7o~) + NXUj - 7o"), 7 e (70", 70^") 

7o(7) + |logr|, 7e[7o">7i) 

{A(7) + |logr|, 7e(7o,7r] 
iV7i(7+) + ArA|(7 - 7+), 7 G (7r, 7i+) 
iV/i(7), 7e[7i+,7i)- 

Corollary 6 Let Assumptions 1-3 hold and consider the family of distributed detectors in (13) and (14) 

parameterized by detector thresholds 7 G (70,71)- Then: 

(a) 

liminf-ilogPe,i(fc,7) > min{5o(7),5i(7)} > 0, (24) 

and the lower bound in (24) is maximized for the point 7* G (70,71)^ at which .60(7*) = Si (7*). 

(b) Consider A* = argmin;^g]gAo(A), and let: 

thr {Ao,N) = max{Ao(A^A*) - AfAo(A*), Ao(l - N{1 - A*)) - NAo{X')}, (25) 

Then, when |logr| > thr(Ao,Ar), each sensor i with the detector threshold set to 7 = 0, is 
asymptotically optimal: 

lim -ilogPe,i(A:,0) = ATCind- 
A;— >-oo K 

(c) When Ao(A) = Ao(l — A), for A G [0, 1] 7* = 0, irrespective of the value of r (even when | logr| < 
thr(Ao,Ar).) 

^As we show in the proof, such a point exists and is unique. 
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Figure 1 (left) illustrates the error exponent lower bounds Bq{j) and Bi{'-f) in Theorem 5, while 
Figure 1 (right) illustrates the quantities in (21). ( See the definition of the function ^I^ol-^) in (36) 
in the proof of Theorem 5.) We consider = 3 sensors and a discrete distribution of Yi{t) over a 
5-point alphabet, with the distribution [.2, .2, .2, .2, .2] under Hi, and [0.01,0.01,0.01,0.01,0.96] under 
Hq. We set here r = 0.4. 




Fig. 1. Left: Illustration of the error exponent lower bounds Bo (7) and Bi{'y) in Theorem 5; Right: Illustration of the function 
$o(A) in (36), and the quantities in (21). We consider A*' = 3 sensors and a discrete distribution of Yi{t) over a 5-point alphabet, 
with the distribution [.2, .2, .2, .2, .2] under Hi, and [0.01, 0.01, 0.01, 0.01, 0.96] under Hq. We set here r = 0.4. 

Corollary 6 states that, when the network connectivity | logr| is above a threshold, the distributed de- 
tector in (13) and (14) is asymptotically equivalent to the optimal centralized detector. The corresponding 
optimal detector threshold is 7 = 0. When | logrj is below the threshold. Corollary 6 determines what 
value of the error exponent the distributed detector can achieve, for any given 7 G (70,71)- Moreover, 
Corollary 6 finds the optimal detector threshold 7* for a given r; 7* can be found as the unique zero of 
the strictly decreasing function Ab{'^) := ^1(7) — ^0(7) on 7 G (70,71), see the proof of Corollary 6, 
e.g., by bisection on (70,71)- 

Remark. When Ao(A) = Ao(l — A), for A € [0, 1], it can be shown that 70 = —71 < 0, and Bo{j) = 
Bi{—j), for all 7 G (70,71)- This implies that the point 7* at which Bq and Bi are equal is necessarily 
zero, and hence the optimal detector threshold 7* = 0, irrespective of the network connectivity | logr| 
(even when | logrj < thr(Ao, A^)-) This symmetry holds, e.g., for the Gaussian and Laplace distribution 
considered in Section IV. 

Corollary 6 establishes that there exists a "sufficient" connectivity, say | log r*\, so that further improve- 
ment on the connectivity (and further spending of resources, e.g., transmission power) does not lead to 
a pay off in terms of detection performance. Hence, Corollary 6 is valuable in the practical design of 
a sensor network, as it says how much connectivity (resources) is sufficient to achieve asymptotically 
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optimal detection. 

Equation (24) says that the distribution of the sensor observations (through LMGF) plays a role in 
determining the performance of distributed detection. We illustrate and explain by examples this effect 
in Section IV. 

B. Proofs of the main results 

We first prove Theorem (5). 

Proof of Theorem 5: Consider the probability of false alarm ai{k,^f) in (19). We upper bound 
ai{k,j) using the exponential Markov inequality [32] parameterized by C > 0: 



ai{k, 7) = P {xi{k) > 7 I i^o) = P (e^''*^'^) > e^^ | i?o) < E 
Next, by setting ( = N kX, with A > 0, we obtain: 



-Ct 



-NkX-y 



E 



-NkX-r 



(26) 

(27) 
(28) 



The terms in the sum in the exponent in (28) are conditionally independent, given the reahzations of the 
averaging matrices W{t), t = I, . . . ,k, Thus, by iterating the expectations, and using the definition of 
Ao in (4), we compute the expectation in (28) by conditioning first on W{t), t = I, . . . ,k: 



E 



= E 
= E 



E 



=EtiE"iAo(iVA*,,,(fc,t)) 



(29) 



Partition of the sample space. We handle the random matrix realizations W{t), t = 1,. . . ,k, through 
a suitable partition of the underlying probability space. Adapting an argument from [1], partition the 
probability space based on the time of the last successful averaging. In more detail, for a fixed k, 
introduce the partition Vk of the sample space that consists of the disjoint events As,k, s = 0, 1, fc, 
given by: 

As,k = {\\^{k, s) - J|| < e and \\^{k, s + I) - J\\ > e} , 

for s = l,...,k — 1, Ao,k = {11^(^)1) — -^11 > e}, and A^^k = {\\^{k,k) — J\\ < e}. For simphcity 
of notation, we drop the index k in the sequel and denote event As,k by .4s, s = 0, . . . , A;, for e > 0. 
Intuitively, the smaller t is, the closer the product ^{k,t) to J is; if the event As occurred, then the 
largest t for which the product ^{k,t) is still e-close to J equals s. We now show that Vj. is indeed a 
partition. We need the following simple Lemma. The Lemma shows that convergence of ^{k, s) — J is 
monotonic, for any realization of the matrices VF(1), W{2), W{k). 
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Lemma 7 Let Assumption 3 hold. Then, for any realization of the matrices W{k): 

\\^{k,s) - J|| < \\^{k,t) - J\\, for 1 < s < t < fc. 

Proof: Since every reaUzation of W{t) is stochastic and symmetric for every t, we have that W{t)l = 
1 and l'^W{t) = 1^, and, so: ^{k, s) - J = W{k) ■ ■ ■ W{s) - J = {W{k) - J) • ■ ■ {W{s) - J). Now, 
using the sub-multiplicative property of the spectral norm, we get 

mk, s) - J\\ = \\{W{k) - J) • • • {W{t) - J)iWit - 1) - J) • ■ • iW{s) - J)\\ 

< \\{w{k) - J) • • • mt) - mmt mis) 

To prove Lemma 7, it remains to show that — J|| < 1, for any realization of W{t). To this 

end, fix a realization W of W{t). Consider the eigenvalue decomposition W = QMQ^, where M = 
diag(jLti, . . . , /xjv) is the matrix of eigenvalues of W, and the columns of Q are the orthonormal eigen- 
vectors. As ^1 is the eigenvector associated with eigenvalue ni = 1, we have that W — J = QM'Q^ , 
where M = diag(0, /i2, . . . , /iiv)- Because W is stochastic, we know that 1 = /^i > ^2 > ••• > I^n > — 1, 
and so \\W — J\\ = max{|^2|, \I-i-n\} < 1- ■ 
To show that is a partition, note first that (at least) one of the events •••^-^fc necessarily occurs. 
It remains to show that the events .4s are disjoint. We carry out this by fixing arbitrary s = 1, ...,k, and 
showing that, if the event .4^ occurs, then At, t ^ s, does not occur. Suppose that .4s occurs, i.e., the 
realizations 1^(1), W{k) are such that \\^{k, s) — J|| < e and \\^{k, s + 1) — J\\ > e. Fix any t>s. 
Then, event At does not occur, because, by Lemma 7, \\^{k, t) — J\\ > \\^{k, s + 1) — J\\ > e. Now, fix 
any t < s. Then, event At does not occur, because, by Lemma 7, t + 1) — J\\ < \\^{k, s) — J\\ < e. 
Thus, for any s = l,...,k, if the event .4s occurs, then At, for t ^ s, does not occur, and hence the 
events .4s are disjoint. 

Using the total probability law over Vk, the expectation (29) is computed by: 

k 

(30) 



E 



3EtiEf=iAo(iVA*,.,(fc,t))J = Je^tiEf=iAo(iVA$,,,(M))x^^ 

s=0 

where, we recall, is the indicator function of the event .4s. The following lemma explains how to 
use the partition Vk to upper bound the expectation in (30). 

Lemma 8 Let Assumptions 1-3 hold. Then: 

(a) For any realization of the random matrices W{t), t = 1,2, ...,k: 

N 

Ao {NX^ijik, t)) < Ao (NX) , Vt = 1, . . . , fc. 
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(b) Further, consider a fixed s in {0, 1, A;}. If the event .4s occurred, then, for z = l,...,iV: 

Ao {NX^ij{k,t)) < max (^Aq (^X-cNVnX^ ,Ao (^\ + eN^/N\j^ , Vi = 1, . . . , s, Vj = 1, . . . , AT. 

Proof: To prove part (a) of the Lemma, by convexity of Aq, the maximum of '^f^i ^o{N Xaj) 
over the simplex |a G : Y^f=i o,j = 1, aj > 0, j = I, . . . , | is achieved at a comer point of the 
simplex. The maximum equals: Ao{NX) + {N — l)Ao(O) = Ao{NX), where we use the property from 
Lemma 1, part (b), that Ao(0) = 0. Finally, since for any realization of the matrices . . . , W{k), 

the set of entries {^ij{k, t) : j = 1, . . . , N} is a point in the simplex, the claim of part (a) of the Lemma 
follows. 

To prove part (b) of the Lemma, suppose that event .4s occurred. Then, by the definition of .4s, 

\\^{k, s) - J\\ = \\W{k) W{s) - J\\ < e. 

Using the fact that each realization W{t), t = 1, 2, . . ., is doubly stochastic, and using the sub-multiplicative 
property of the spectral norm, we have that 

||$(A:, t) - J\\ = \\W{k) W{t) - J\\ < e, 

for every t < s. Then, by the equivalence of the 1-norm and the spectral norm, it follows that: 

1 



< VNe, for t = 1, . . . , s, for all i,j = 1,...,N. 



X - eN^/NX, X + eN^/NX 



is attained 



Finally, since Aq is convex (Lemma 1, part (a)), its maximum in 
at a boundary point and the claim follows. ■ 
We now fix 5 G (0, | log r|). Using the results from Lemma 4 and Lemma 8, we next bound the expectation 
in (30) as follows: 



j^eE?=iEf=iAo(iVA*i,j(fc,t)) " < ^ ^^sNm^{Ao{X-eNVN^),Ao{X+eNVNX))+{k-s)Ao{NX)^ 
s=0 s=0 

X (c(,5)e-('=-(^+i))(l^°g'-|-^)) . (31) 

To simplify the notation, we introduce the function: 

go-.R^ — > M, go{e, A) := max (^Aq - eN^/NX^ , Aq (a + eN^/NX^^ . (32) 
We need the following property of go{-, ). 

Lemma 9 Consider gQ{-, •) in (32). Then, for every A G M, the following holds: 

inf3o(e,A) = Ao(A). 

e>0 
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Proof: Since Ao( ) is convex, for e' < e and for fixed A, we have that 

5o(e,A)= max Ao fA + (5A^ViVA') > max Kq(\ + SN^fNX] = gQ{€' ,\). 

(56[-e,e] V / 5e[-e',e'] V / 

Thus, for fixed A, /(•, A) is non-increasing, and the claim of the Lemma follows. ■ 

We proceed by bounding further the right hand side in (31), by rewriting e~(*'~(*+^))(l i°sr|-<5) J_ g-(fc-s)(| iogr|-(5). 



E ^^^) gSNgo{e,\) + {k-s)Ao{N\) - {k-s)(\ \ogr\-S) 

s=0 



< (k + 1) max ^^e^^^9o{e,^) + {k-s)(Ao{N\)-{\logr\~5))] 

se{o,...,k} re^ 

= + 1) ^ii^e^^-^^f ,>[sJVflo(e,A) + (fe-s)(Ao(iVA)-(|logr|-5))] 



re 



< (k + 1) ^^^^ gfcmax,g[o,ii[0Argo(6,A) + (l-0)(Ao(ArA)-(|logr|-<5))] 



+ 1) ^^^e'=[(^»''(''^)'^°(^^)-(l'°s^|-''))l. (33) 



The second inequality follows by introducing 9 := ^ and by enlarging the set for 6 from {O, ^, . . . , l} 
to the continuous interval [0, 1]. Taking the log and dividing by k, from (27) and (33) we get: 



C{S) 



^\ogai{k,-i) < log(fe^+ 1) + lQg_re^ ^ max{jVgo(e, A), Ao(iVA) - (| logr| - 5)} - iV7A.(34) 

Taking the limsup when A; — > oo, the first two terms in the right hand side of (34) vanish; further, 
changing the sign, we get a bound on the exponent of ai{k) that holds for every e > 0: 

liminf-^logai(fc,7) > - max {iVc/ofe, A), Ao(AfA) - (I logrl - 5)} + Ar7A. 
k 

By Lemma 9, as e — > 0, NgQ{e, A) decreases to Ao(A); further, letting 5 — > 0, we get 

liminf-Jlogaj(A;,7) > - max{ArAo(A), Ao(A^A) - | logr|} + Ar7A. (35) 

The previous bound on the exponent of the probability of false alarm holds for any A > 0. To get the 
best bound, we maximize the expression on the right hand side of (35) over A G [0, oo). (We refer to 
figure 1 to help illustrate the bounds -80(7) and -81(7) for a discrete valued observations Yi{t) over a 
5-point alphabet.) To this end, introduce 

$o(A) := max{ArAo(A), Ao(A^A) - | logr|} . (36) 
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We show that the best bound equals ^0(7) in (23), i.e.: 

So (7) = maxNjX - $o(A). (37) 

From the first order optimality conditions, for a fixed 7, an optimizer A* = A* (7) (if it exists) of the 
objective in (37) is a point that satisfies: 

Njed^oiX*), \*>0, (38) 

where d^o{X) denotes the subdifferential set of $0 at A. We next characterize d^o{X), for A > 0. Recall 
the zero Aq of Ao( ) from Theorem 5. The subdifferential d^o{X) is: 

' {iVA'o(A)}, for Xe[Q,Xl) 

5$o(A) = < [A^A'o(A),iVA'o(iVA)], for A = A^ (39) 
[ {AfA'o(iVA)}, for A > A^. 

We next find ^0(7) for any 7 G (70,71), by finding A* = A* (7) for any 7 G (70,71)- Recall 7^^ and 
7q" from Theorem 5. We separately consider three regions: 1) 7 G [70,7(7]; 2) 7 G (7o",7o^); and 3) 
7 G [70", 71]. For the first region, recall that Aq(0) = 70, i.e., for 7 = 70, equation (38) holds (only) for 
A* = 0. Also, for 7 = 7^, we have Aq(Aq) = 7^, i.e., equation (38) holds (only) for A* = Aq. Because 
Aq(A) is continuous and strictly increasing on A G [0, Aq], it follows that, for any 7 G [70,7(7] there 
exists a solution to (38), it is unique, and Ues in [0, Aq]. Now, we calculate -60(7): 

Bo (7) = A^A*7 - $o(A*) = A^A*7 - ArAo(A*) (40) 

= iV(A^7 - Ao(A^)) = Arsup(A7 - Ao(A)) = iV/o(7), (41) 

A>0 

where we used the fact that $o(A*) = iVAo(A*) (because A* < Aq), and the definition of the function 
/o() in (12). We now consider the second region. Fix 7 G {IojJq)- It is trivial to verify, from (39), that 
A* = Aq is the solution to (38). Thus, we calculate So (7) as follows: 

Bo(7) = iVA^7 - $o(A^) = iVA^7 - ArAo(A^) (42) 
= A^A^(7 - 7o~) + ^A«o7o~ - NAo{Xl) = NX^j - + Ar/o(7o~), (43) 

where we used the fact that XqJq — Ao(Aq) = sup;^>Q Xjq — Ao(A) = /o(7(7)- The proof for the third 
region is analogous to the proof for the first region. 

For a proof of the claim on the probabihty of miss Pi{k,j) = F {xi{k) < j\Hi), we proceed analo- 
gously to (26), where instead of C > 0, we now use C < (and, hence, the proof proceeds with A < 0). 
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Proof of Corollary 6: We first prove part (a). Consider the error probability Pe,i{k, 7) in (19). By 
Lemma 1.2.15 in [30], we have that: 



where last inequality is by Theorem 5. We now show that min{Bo{'j) , Bii^j)} > for all 7 € (70,71 )• 
First, from the expression for -80(7) in Theorem 5, for |logr| > 0, we have: ^0(70) = iVJo(7o) = 0, 
and ^0(7) = NlQ{'y) > for any 7 G (70, 7o^)- As the function -Bo(-) is convex, we conclude that 
^0(7) > 0, for all 7 > 70- (The same conclusion holds under |logr| = 0, by replacing NIo{j) 
with /o(7) + I logr| = /o(7)-) Analogously, it can be shown that -61(7) > for all 7 < 71, and so 
mm{Bo{-f),Bi{j)} > 0, for all 7 G (70, 7i)- 

We now calculate max^g^^^^ ,^^-) min{Bo{-j), ^1(7)}. Consider the function Ag(7) := -61(7) — -80(7). 
Using the definition of -80(7) in Theorem 5, and taking the subdifferential of -60(7) at any point 7 G 
(705 7i)> it is easy to show that ^0(7) > 0, for any subgradient Bq{'j) G dB^i^), which implies that 
Bq{-) is strictly increasing on 7 G (70,71). Similarly, it can be shown that -Bi( ) is strictly decreasing 
on 7 G (70,71). Further, using the properties that /o(7o) = and -^1(71) = 0, we have Ab(7o) = 
-61(70) > 0, and Ab(7i) = —.80(71) < 0. By the previous two observations, we have that Ab(7) is 
strictly decreasing on 7 G (70, 71), with Ab(7o) > and ^b{ii) < 0. Thus, Ab(-) has a unique zero 7* 
in 7 G (70,71)- Now, the fact that max^g(^g^^^) min{i?o(7), -61(7)} = -80(7*) = Si (7*) holds trivially 
because Bq{-) is strictly increasing on 7 G (70,71) and Bi{-) is strictly decreasing on 7 G (70,71). This 
completes the proof of part (a). 

We now prove part (b). Suppose that | logr| > thr(Ao, N). We show that, for 7 = 0: 



(Last equality in (44) holds because /i(0) = (-^0(7) — 7)l7=o = ^o(O).) Equations (44) mean that 
5o(0) = BiifS). Further, G (70, 71), and, from part (a), 7* is unique, and so 7* has to be 0. This shows 
that sup^g(^^ ,^^) min{Bo(7),Si(7)} = Ar/o(0) = A'"Cind, and so, by part (a): 




>min{So(7),-8i(7)} 



5o (0) = iV/o (0) , 5i (0) = Nh (0) = NIo (0) . 



(44) 



liminf--logPe,i(A;,0) > iYQnd. 



(45) 
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On the other hand, 

limsup logPe,i(fe, 0) < iVCind, (46) 
k—^oo 

because, by the Chemoff lemma [30], for any test (with the corresponding error probabihty P^{k,j),) 
we have that limsupj._^j^ — ^ logPg(A;,7) < ATCind- Combining (45) and (46) yields' 

iVCind < liminf-ilogPe,i(^,0) < limsup -| log Pe,i(fc, 0) < NC^^d- 

To complete the proof of part (b), it remains to show (44). We prove only equality for Bq as equaUty 
for Bi follows similarly. Because |logr| > thr(Ao, A?^), we have, from the definition of $o( ) in (36), 
that $o(A*) = NAo{X'). Recall that Bo{0) = -^o{>^*), where A* is a point for which (38) holds 
for 7 = 0. However, because 5$o(A') = {iVA'o(A')}, and A'o(A*) = 0, it follows that A'' = A* and 
Bq{0) = -$o(A*) = -ArAo(A*) = A^/o(0), which proves (44). Thus, the result in part (b) of the Lemma. 

■ 

IV. Examples 

This section illustrates our main results for several examples of the distributions of the sensor obser- 
vations. Subsection IV-A compares the Gaussian and Laplace distributions, both with a finite number of 
sensors N and when N oo. Subsection IV-B considers discrete distributions with finite support, and, in 
more detail, binary distributions. Finally, Subsection IV-C numerically demonstrates that our theoretical 
lower bound on the error exponent (24) is tight. Subsection IV-C also shows trhough a symmetric, tractable 
example how distributed detection performance depends on the network topology (nodes' degree and link 
occurrence/failure probabihty.) 

A. Gaussian distribution versus Laplace distribution 

Gaussian distribution. We now study the detection of a signal in additive Gaussian noise; Yi{t) has 
the following density: 



/G(y) 



27r(TG 



TT 



1 - Ha 



V27r(TG 

The LMGF is given by: Ao,g(A) = fhe minimum of Ao,g is achieved at A* = ^, and the 

per sensor Chemoff information is Cind,G = 

Applying Corollary 6, we get the sufficient condition for optimality: 

I log r\ > Ao,G (^^- ^Ao,G = Ar(Ar - l)Cind,G. (47) 
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Since Ao(A) = Ai(A), the two conditions from the Corollary here reduce to a single condition in (24). 

Now, let the number of sensors N ^ oo, while keeping the total Chemoff information constant, i.e., 
not dependent on N; that is, Cq '■= NCind,G = const, Cind,G(^) = Cq/N. Intuitively, as N increases, 
we deploy more and more sensors over a region (denser deployment), but, on the other hand, the sensors' 
quality becomes worse and worse. The increase of N is balanced in such a way that the total information 
offered by all sensors stays constant with N. Our goal is to determine how the optimality threshold on 
the network connectivity thr(JV, Ao,g) depends on N. We can see from (47) that the optimality threshold 
for the distributed detector in the Gaussian case equals: 

thr(Ao,G,A^) = (iV-l)CG. (48) 

Laplace distribution. We next study the optimality conditions for the sensor observations with Laplace 
distribution. The density of Yi{t) is: 



-, _M 



26, 



/L(y) = 

The LMGF has the following form: 

Ao,l(A) = log ^— e - — e 
Again, the minimum is at A* = ^, and the per sensor Chemoff information is 

and,L = ^ - log (1 + ^ 

The optimality condition in (24) becomes: 

|logr| > Ao,L f ^ V f (49) 



^2 J V2. 

{ 2-N N _(i_«):nL\ / mL\ ,^mL 

= log e ^ e ^ ^' ''^ \ - N\os,\l^ - \ + N—. 

^\2-2N 2-2N J ^\ 26l/ 26l 

Gaussian versus Laplace distribution. It is now interesting to compare the Gaussian and the Laplace 
case under equal per sensor Chemoff information Cind,L = C'ind,G- Figure 2 (left) plots the LMGF for 
the Gaussian and Laplace distributions, for N = 10, Cind = C'ind,L = C'ind,G = 0.0945, 6l = 1> Jtzl = 1, 
and mQ/aQ = 0.7563 = SCmd- By (25), the optimality threshold equals 

|iVAo(l/2)| + |Ao(iV/2)|, 

as A* = 1/2, for both the Gaussian and Laplace distributions. The threshold can be estimated from 
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Figure 2 (left): solid lines plot the functions Ao(A^A) for the two different distributions, while dashed 
lines plot the functions A^Ao(A). For both solid and dashed lines, the Gaussian distribution corresponds 
to the more curved functions. We see that the threshold is larger for the Gaussian case. This means 
that, for a certain range r G (rmim ^^max), the distributed detector with Laplace sensors is asymptotically 
optimal, while with Gaussian sensors the distributed detector may not be optimal, even though it uses the 
same network infrastructure (equal r) and has equal per sensor Chernoff information. (See also Figure 2 
(right) for another illustration of this effect.) 

We now compare the Gaussian and Laplace distributions when N ^ oo, and we keep the Gaussian 
total Chernoff information Cg constant with N. Let the Laplace distribution parameters vary with as: 

mL = mL(iV) = 6l = b^N) = I. 

V N 

We can show that, as — oo, the total Chernoff information Cl(A^) — >■ Cg as A^ — oo, and so the 
Gaussian and the Laplace centralized detectors become equivalent. On the other hand, the threshold for 
the Gaussian distributed detector is given by (48) while, for the Laplace detector, using (49) and a Taylor 
expansion, we get that the optimality threshold is approximately: 

thr(Ao,L,Ar)«y2QjiV. 

Hence, the required | logr| to achieve the optimal error exponent grows much slower with the Laplace 
distribution than with the Gaussian distribution. 




Fig. 2. Left: LMGFs for Gaussian and Laplace distributions with equal per sensor Cliernoff informations, for A*' — 10, 
Cind ~ Cind.L ~ Cind.G = 0.0945, = 1, ?TiL = 1, and niQ / aQ = 0.7563 — SCind- Solid lines plot the functions 
Ao{N\) for the two distributions, while dashed lines plot the functions A'^Ao(A). For both solid and dashed lines, the Gaussian 
distribution corresponds to the more curved functions. The optimality threshold in (25) is given by jAfAo(l/2)| + |Ao(A'^/2)|, 
as A* = 1/2. Right: Lower bound on the error exponent in (24) and Monte Carlo estimate of the error exponent versus | logr| 
for the Gaussian and Laplace sensor observations: A'^ — 20, Cind = Cind.L ~ Cind.G ~ 0.005, &l ~ 1, "1l ~ 0.2, and 
m%/al = 0.04 = 8Ci„d. 
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B. Discrete distributions 

We now consider the case when the support of the sensor observations under both hypothesis is a 
finite alphabet {rai,a2, ...,aM}- This case is of practical interest when, for example, the sensing device 
has an analog-to-digital converter with a finite range; hence, the observations take only a finite number 
of values. Specifically, the distribution of Yi{k), Vz, Vfc, is given by: 

F{Yi{k) = am) = { , m = l,...,M. (50) 

Pm, Hq 



Then, the LMGF under Hq equals: 



Ao{X) = log (^qipl^ A 
\m=l ) 



Note that Ao(A) is finite on M. Due to concavity of — Ao(-), the argument of the Chemoff information 
A* (Cind = mf-^Agio ijl^-'^oCA)} = — Ao(A*)) can, in general, be efficiently computed numerically, for 
example, by the Netwon method (see, e.g., [33], for details on the Newton method.) It can be shown, 
defining Cm = log {^y^^ that the Newton direction, e.g., [33] equals: 



d(A) = -(A(;(A))-'A'o(A) 



1 



Binary observations. To gain more intuition and obtain analytical results, we consider (50) with M = 2, 
i.e., binary sensors, with p2 = 1 — Pi = 1 — 92 = 1 — 9i = 1 — 9- Suppose further that p < q. We can 
show that the negative of the per sensor Chemoff information Ag^bin and the quantity A* are: 

-Cind = Ao,bin(A*) = AMogQ +logp + log^l-^^l^j 
log (i^) - log (f ) ■ 



NX- /I \ N\'\ / \ NX- 

1-q 



Further, note that: 

Ao,Mn(iVA-) = log (^p[^) ^^'-^^[rrp) )^ [p ) = [p ) ■ (51) 

Also, we can show similarly that: 

Ao,bin(l - A^(l - A*)) < N{1 - A*) log ([^) ■ (52) 
Combining (51) and (52), and applying Corollary 6 (equation (24)), we get that a sufficient condition for 
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asymptotic optimality is: 
I logrl > max < 



iV log - - A/" log I 1 + 
P 





log 2 






& p 




loff^ 




" 1-p 



,iVlog— ATlog I 1 + 

1-q 



log^ 



p 



We further assume a very simplified sufficient condition for optimality: 

I logr| > iVmax{|log^»| , |log(l — q)\} . (53) 

The expression in (53) is intuitive. Consider, for example, the case p = 1/2, so that the right hand side 
in (53) simplifies to: log(l — q)\. Let q vary from 1/2 to 1. Then, as q increases, the per sensor 
Chemoff information increases, and the optimal centralized detector has better and better performance 
(error exponent.) That is, the centralized detector has a very low error probability after a very short 
observation interval k. Hence, for larger q, the distributed detector needs more connectivity to be able to 
"catch up" with the performance of the centralized detector. We compare numerically Gaussian and binary 
distributed detectors with equal per sensor Chemoff information, for N = 32 sensors, Cmd = 5.11 • 10~^, 
''^g/^G ~ ^C'ind> P = 0.1, and q = 0.12. Binary detector requires more connectivity to achieve asymptotic 
optimality (r 0.25), while Gaussian detector requires r ^ 0.5. 

C. Tightness of the error exponent lower bound in (24) and impact of the network topology 

Assessment of the tightness of the error exponent lower bound in (24). We note that the result in (24) 
is a theoretical lower bound on the error exponent. In particular, the condition |logr| > thr(Ao,iV) is 
proved to be a sufficient, but not necessary, condition for asymptotically optimal detection; in other 
words, (24) does not exclude the possibility of achieving asymptotic optimality for | logr| smaller than 

thr(Ao,A^). In order to assess the tightness of (24) (for both the Gaussian and Laplace distributions,) 
we perform Monte Carlo simulations to estimate the actual error exponent and compare it with (24). 
We consider = 20 sensors and fix the sensor observation distributions with the following parameters: 
Cind = Qnd.L = Qnd.G = 0.005, 6l = 1, = 0.2, and niQ/aQ = 0.04 = SCind- We vary r as follows. 
We construct a (fixed) geometric graph with N sensors by placing the nodes uniformly at random on 
a unit square and connecting the nodes whose distance is less than a radius. Each link is a Bernoulli 
random variable, equal to 1 with probability p (link online), and equal to with probability I — p (link 
offline). The link occurrences are independent in time and space. We change r by varying p from to 
0.95 in increments of 0.05. We adopt the Metropolis weights: whenever a link {i,j} is online, we set 
Wij{k) = 1/(1 + max(dj(A;), dj{k))), where di{k) is the number od neighbors of node i at time k; when 
a link is offline, Wij{k) = 0; and Wuik) = 1 — YlijeO, ^ii(^)' where we recall that Oj is the 
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neighborhood of node i. We obtain an estimate of the error probability Pe,i{k) at sensor i and time k 
using 30,000 Monte Carlo runs of (13) per each hypothesis. We then estimate the sensor-wide average 
error exponent as: 

1 ^ logPe,,(i^i)-logPe,»(i^2) 
K2-K1 

1=1 

with Ki = 40, K2 = 60. That is, we estimate the error exponent as the average slope (across sensors) of 
the error probability curve in a semi-log scale. Figure 2 (right) plots both the theoretical lower bound on 
the error exponent in (24) and the Monte Carlo estimate of the error exponent versus | log r\ for Gaussian 
and Laplace distributions. We can see that the bound (24) is tight for both distributions. Hence, the actual 
distributed detection performance is very close to the performance predicted by (24). (Of course, above 
the optimality threshold, (24) and the actual error exponent coincide and are equal to the total Chemoff 
information.) Also, we can see that the theoretical threshold on optimahty thr(Ao, N) and the threshold 
value computed from simulation are very close. Finally, the distributed detector with Laplace observations 
achieves asymptotic optimality for a smaller value of | logr| (| logr| 1.2) than the distributed detector 
with Gaussian observations (|logr| « 1.6), even though the corresponding centralized detectors are 
asymptotically equivalent. 

Impact of the network topology. We have seen in the previous two subsections how detection perfor- 
mance depends on r. In order to understand how r depends on the network topology, we consider a 
symmetric network structure, namely a regular network. For this case, we can express r as an explicit 
(closed form) function of the nodes' degrees and the link occurrence probabihties. (Recall that the smaller 
r is, the better the network connectivity.) 

Consider a connected regular network with N nodes and degree d > 2. Suppose that each link is a 
Bernoulli random variable, equal to 1 with probability p (link online) and with probability 1 — p (link 
offline,) with spatio-temporally independent link occurrences. Then, it can be shown [31] that r equals: 

r = il-pf. (54) 

This expression is very intuitive. When p increases, i.e., when the links are online more often, the network 
(on average) becomes more connected, and hence we expect that the network connectivity | log r | increases 
(improves). This is confirmed by (54): when p increases, r becomes smaller and closer to zero. Further, 
when d increases, the network becomes more connected, and hence the network speed again improves. 
Note also that | logr| = d\ log(l — p)| is a linear function of d. 

We now recall Corollary 6 to relate distributed detection performance with p and d. For example. 
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for a fixed p, the distributed detection optimality condition becomes d > ic, distributed 

detection is asymptotically optimal when the sensors' degree is above a threshold. Further, because 
d < A^, it follows that, for a large value of thr(Ao, N) and a small p, even networks with a very large 
degree (say, d = N) do not achieve asymptotic optimality. Intuitively, a large thr(Ao,A^) means that 
the corresponding centralized detector decreases the error probability so fast in k that, because of the 
intermittent link failures, the distributed detector cannot "catch up" with the centralized detector. Finally, 
when p = 1, the optimality condition becomes d> 0, i.e., distributed detection is asymptotically optimal 
for any d>2. This is because, when p = 1, the network is always connected, and the distributed detector 
asymptotically "catches up" with the arbitrarily fast centralized detector. In fact, it can be shown that 
an arbitrarily connected network with no hnk failures achieves asymptotic optimahty for any value of 
thr(Ao, N). (It can be shown that such a network has r = 0, and, consequently, the network connectivity 
I logr| is oo.) 

V. NON-IDENTICALLY DISTRIBUTED OBSERVATIONS 

We extend Theorem 5 and Corollary 6 to the case of (independent) non-identically distributed obser- 
vations. First, we briefly explain the measurement model and define the relevant quantities. As before, 
let Yi{t) denote the observation of sensor i at time t, i = 1, . . . , A^, t = 1, 2, . . .. 

Assumption A The observations of sensor i are i.i.d. in time, with the following distribution: 

{Vj 1 , Hi 
, i = l,...,N,t = l,2,... 
1^1,0, Ho 

(Here we assume that Ui^i and z^j o are mutually absolutely continuous, distinguishable measures, for 
i = 1, . . . , N). Further, the observations of different sensors are independent both in time and in space, 
i.e., for i ^ j, Yi{t) and Yj(k) are independent for all t and k. 

Under Assumption A, the form of the log-likelihood ratio test remains the same as imder Assumption 1 : 

k N ^ 
^""^ t=li=l ^0 

where the log-likelihood ratio at sensor i, i = 1, N, is now: 

where fi^i, ^ = 0, 1, is the density (or the probability mass) function associated with i/j We now discuss 



the choice of detector thresholds 7. Let = E jj Y^lLi Li{t)\Hi = (^X]i=i7i,ij We will show 
that, if I logr| > 0, any 7 G (7o)7i) yields an exponentially fast decay of the error probability, at any 
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sensor. The condition | logr| > means that the network is connected on average, e.g., [34]; if met, then, 
for all i, ¥.[xi{k)\Hi] ^ 7; as A; — > oo, Z = 0, 1. (Proof is omitted for brevity.) Clearly, under identical 
sensors, 7^ ^ = 7^,; for any and hence the range of detector thresholds becomes the one assumed in 
Section 11-C. 

Denote by Aj^o the LMGF of Li{t) under hypothesis Hq: 

Aifl : M (-00, +00] , Ai,o(A) = logE ^e^^'^^^Ho 

We assume finiteness of the LMGF's of all sensors. Assumption 2 is restated explicitly as Assumption B. 
Assumption B For i = 1, . . . N, Aj^o(A) < +00, VA G M. 

The optimal centralized detector, with highest error exponent, is the likelihood ratio test with zero 
threshold 7 = [30], its error exponent is equal to the Chemoff information of the vector of all sensors 
observations, and can be expressed in terms of the LMGF's as: 

N N 

Ctot = max - Y]Aj,o(A) = - VAi,o(A'). 

Aeo,i ^ ^ 
1=1 1=1 

Here, A* is the minimizer of ^1,0 over [0, 1]. We are now ready to state our results on the error 
exponent of the consensus+innovation detector for the case of non-identically distributed observations. 
(We continue to use ai{k,^), (3i{k,^), and Pe^i{k,i) to denote the false alarm, miss, and Bayes error 
probabilities of distributed detector at sensor i.) 

Theorem 10 Let Assumptions A, B and 3 hold, and let, in addition, | logr| > 0. Consider the family of 
distributed detectors in (13) and (14) with thresholds 7 G (70, 7i)- Then, at each sensor i: 

liminf-ilogai(A:,7) > ^0(7) > 0, liminf-^ log/3i(fc,7) > -61(7) > 0, (55) 

fe->-oo K fe->-oo K 



where 



N 



^0(7) = max iVA7-max< VAi,o(A), max Aj,o(iVA) - | logr| ^ (56) 

Ae[0,l] 1=1,.. .,N J 

^1(7) = max A^A7-max< y^Aii(A), max A, i(iVA) - | logr| > . (57) 

Ae[-1,0] \i=l ' i=i,-,N ' J 

Corollary 11 Let Assumptions A, B and 3 hold, and let, in addition, | Iogr| > 0. Consider the family of 
distributed detectors in (13) and (14) with thresholds 7 G (70, 7i)- Then: 
(a) At each sensor i: 

liminf-ilogPe,i(^'7) > min{Bo(7),Bi(7)} > 0, (58) 

k—>-oo K 

and the lower bound in (58) is maximized for the point 7* G (7o)7i) which .60(7*) = -61(7*). 
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(b) Consider A* = argmin;^g[o,i] J2iLi^i,o{^)' let: 

thr(Ai,o,...,AAf,o) = (59) 

{AT N \ 

max Ai,o(ArA*)-5^Ai,o(A*), max Ai,o(l - Ar(l - A')) - Ai,o(A') L 
1=1 1=1 ) 

Then, when | log r| > thr (Ai^O) ■ ■■ i ^N<d)^ each sensor i with the detector threshold set to 7 = 0, is 
asymptotically optimal: 

lim --logPe,i(^,0) = Ctot- 

«— >-oo K 

Comparing Theorem 5 with Theorem 10, we can see that, under non-identically distributed observations, 
it is no longer possible to analytically characterize the lower bounds on the error exponents, -60(7) and 
Si (7). However, the objective functions (in the variable A) in (56) and (57) are concave (by convexity 
of the LMGF's) and the underlying optimization variable A is a scalar, and, thus, -60(7) and -61(7) 
can be efficiently found by a one dimensional numerical optimization procedure, e.g., a subgradient 
algorithm [35]. 

Proof of Theorem 10: The proof of Theorem 10 mimics the proof of Theorem 5; we focus only 
on the steps that account for different sensors' LMGF's. First, expression (29) that upper bounds the 
probability of false alarm ai{k,j) for the case of non-identically distributed observations becomes: 



E 



E 



=Eti Ef=iAi,o(JVA$i,,(fe,t)) 



Next, we bound the sum in the exponent of the previous equation, conditioned on the event As, for a 
fixed sin{0,l...,/c}, deriving a counterpart to Lemma 8. 

Lemma 12 Let Assumptions A, B, and 3 hold. Then, 

(a) For any realization of W{t), t = l,2,...,k: 

N 

Y Aj,oiNX^ij{k,t)) < max A^- (ATA) , = 1, . . . , /c. 
^ j=i,...,Ar 

(b) Consider a fixed s in {0, 1, k}. If the event As occurred, then, for i = 1, N: 

N N 

Y ^j,o {NX^ijik, t)) < Y max (Aj- (a - eNy/NX^ , Aj^ (a + eNy/NX^^ , Vt = 1, . . . , s. 
j=i j=i 

The remainder of the proof proceeds analogously to the proof of Theorem 5. ■ 



April 17, 2012 



DRAFT 



28 



VI. Conclusion 

We analyzed the large deviations performance (error exponent) of consensus+innovations distributed 
detection over random networks. The sensors' observations have generic (non-Gaussian) distribution, 
independent, not necessarily identical over space, and i.i.d. in time. Our results hold assuming that 
the log-moment generating functions of each sensor's log-likelihood ratio are finite. We showed that 
the distributed detector exhibits a phase transition behavior with respect to the network connectivity, 
measured by |logr|, where r is the (exponential) rate of convergence in probability of the product 
W{k)W{k — 1) • • • W{1) to the consensus matrix J := (l/iV)ll^. When | logr| is above the threshold, 
the distributed detector has the same error exponent as the optimal centraUzed detector. We further showed 
that the optimality threshold depends on the type of the distribution of the sensor observations. Numerical 
and analytical studies illustrated this dependence for Gaussian, Laplace, and binary distributions of the 
sensors' observations. 

Appendix 

A. Proof of finiteness of the log-moment generating function under (7)-(10) 

We now show that Assumption 2 holds, i.e., that Ao(-) is finite for any A G R, if (7) and (9) hold. The 
other combinations for finiteness of Ao( ) when 1) either (7) or (8); and 2) either (9) or (10) hold can 
be shown similarly, and, hence, for brevity, we do not consider these cases. Assume m > (the case 
m < can be treated analogously), fix A G M and consider: 

Ao(A) = log / ">^/n(2/)dy, (60) 

Jy=—oo 

where we use the fact that the density under Hi is fi{y) = fn{y — "z), i.e., /i(-) is the shifted density 
/„(•) (of the noise) under Hq. With = ce-9(y), (60) is rewntten as: 

Jy=—oo Jy=—oo 

= cf e-M^-<'-^)]dy + c re-M'-'i'-'^)]dy. 

J y=—oo Jy=0 

Now, by (7), for any ei G (0, oo), there exists Mi G (0, oo), so that 

((p+) - ei) y^+ < g{y) < {{p+) + ei) y^+, Vy > Mi. 

Further, we have that: 

((p+) - ei) (y - m)^+ < giy - m) < {{p+) + ei) {y - m)^+, "iy > Mi + m. (61) 
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Also, for any e-2 G (0, oo), there exists M2 G (0, 00), such that: 

(1 - £2) (2/ - m)"+ < < (1 + £2) (2/ - m)^+, > M2. (62) 
Now, combining (61) and (62), we obtain: 

(l-e2)J^^±|^<^^VY^<(l + e2)H^' V2/>M3:=max{Mi + rn,M2}. (63) 
To upper bound the integral j'^^ e dy, we note that, by (63), we can choose M3 

2 _ 9{y-m) 

9{y) 



large enough, so that: 



< j^, Vy > M3, for arbitrary £3 G (0, 1). Thus, we have: 



r~e-""['-H'-'^))<,,= re-»'»)[-(-'*ir')i<„+ r 

Jy=0 Jy=0 Jy=M3 



< M4+ [ e ^^^^[^ '^'i'i]d2/<M4+ /" e-(^-"^)^^(2/)(^y < M4 + M5 < 00. 
Jy=M3 Jy=M3 

Finiteness of the integral jy^_^e L V aiv) )\dy, using equation (9), can be proved in an 
analogous way. As A G M is arbitrary, we conclude that Ao(A) < +00, VA G M. 
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